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Detecting community structure is fundamental to clarify the link between structure and function 
in complex networks and is used for practical applications in many disciplines. A successful method 
relies on the optimization of a quantity called modularity [Newman and Girvan, Phys. Rev. E 69, 
026113 (2004)], which is a quality index of a partition of a network into communities. We find that 
modularity optimization may fail to identify modules smaller than a scale which depends on the 
total number L of links of the network and on the degree of interconnectedness of the modules, 
even in cases where modules are unambiguously defined. The probability that a module conceals 
well-defined substructures is the highest if the number of links internal to the module is of the order 
of V2L or smaller. We discuss the practical consequences of this result by analyzing partitions 
obtained through modularity optimization in artificial and real networks. 
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I. INTRODUCTION 



Community detection in complex networks has at- 
tracted a lot of attention in the last years (for a re- 
view see 0,0). The main reason is that complex net- 
works @, 0,11a El are made of a large number of nodes 
and that so far most of the quantitative investigations 
were focusing on statistical properties disregarding the 
roles played by specific subgraphs. Detecting commu- 
nities (or modules) can then be a way to identify rele- 
vant substructures that may also correspond to impor- 
tant functions. In the case of the World Wide Web, for 
instance, communities are sets of Web pages dealing with 
the same topic |8j. Relevant community structures were 
also found in social networks |3, ^Ol , biochemical net- 
works 

EHHQ, the Internet 111, food webs 0, and 
in networks of sexual contacts jl7j . 



Loosely speaking a community is a subgraph of a net- 
work whose nodes are more tightly connected with each 
other than with nodes outside the subgraph. A decisive 
advance in community detection was made by Newman 
and Girvan [T^|, who introduced a quantitative measure 
for the quality of a partition of a network into commu- 
nities, the so-called modularity. This measure essentially 
compares the number of links inside a given module with 
the expected value for a randomized graph of the same 
size and degree sequence. If one takes modularity as the 
relevant quality function, the problem of community de- 
tection becomes equivalent to modularity optimization. 
The latter is not trivial, as the number of possible par- 
titions of a network in clusters increases exponentially 
with the size of the network, making exhaustive opti- 
mization computationally unreachable even for relatively 
small graphs. Therefore, a number of algorithms have 



been devised in order to find a good optimization with 
the least computational cost. The fastest available pro- 
cedures uses greedy techniques [HI H3 ano ^ extremal op- 
timization [2l|, and are at present time the only algo- 
rithms capable to detect communities on large networks. 
More accurate results are obtained through simulated an- 
nealing H2,[2il, although this method is computationally 
very expensive. 

Modularity optimization seems thus to be a very effec- 
tive method to detect communities, both in real and in 
artificially generated networks. The modularity itself has 
however not yet been thoroughly investigated and only 
a few general properties are known. For example, it is 
known that the modularity value of a partition does not 
have a meaning by itself, but only if compared with the 
corresponding modularity expected for a random graph 
of the same size |24j . as the latter may attain very high 
values, due to fluctuations [22^ . 

In this paper we focus on communities defined by mod- 
ularity. We will show that modularity contains an intrin- 
sic scale which depends on the number of links of the 
network, and that modules smaller than that scale may 
not be resolved, even if they were complete graphs con- 
nected by single bridges. The resolution limit of modular- 
ity actually depends on the degree of interconnectedness 
between pairs of communities and can reach values of the 
order of the size of the whole network. It is thus a priori 
impossible to tell whether a module (large or small), ob- 
tained through modularity optimization, is indeed a sin- 
gle module or a cluster of smaller modules. This result 
thus introduces some caveats in the use of modularity to 
detect community structure. 

In Section [D] we recall the notion of modularity and 
discuss some of its properties. Section lllll deals with 
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the problem of finding the most modular network with a 
given number of nodes and links. In Section HVI we show 
how the resolution limit of modularity arises. In Sec- 
tion we illustrate the problem with some artificially 
generated networks, and extend the discussion to real 
networks. Our conclusions are presented in Section IVT1 



II. MODULARITY 

The modularity of a partition of a network in modules 
can be written as [HI 
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(1) 



where the sum is over the m modules of the partition, 
l s is the number of links inside module s, L is the total 
number of links in the network, and d s is the total degree 
of the nodes in module s. The first term of the summands 
in Eq. is the fraction of links inside module s; the 
second term instead represents the expected fraction of 
links in that module if links were located at random in 
the network (under the only constraint that the degree 
sequence coincides with that in the original graph). If for 
a subgraph S of a network the first term is much larger 
than the second, it means that there are many more links 
inside S than one would expect by random chance, so 
S is indeed a module. The comparison with the null 
model represented by the randomized network leads to 
the quantitative definition of community embedded in 
the ansatz of Eq. (Q. We conclude that, in a modularity- 
based framework, a subgraph S with l s internal links and 
total degree d s is a module if 
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Let us express the number of links l° s ut joining nodes of 
the module s to the rest of the network in terms of l Sl 
i.e. l° s ut = al 8 with a > 0. So, d s = 21 s + l° s ut = + 
and the condition becomes 
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from which, rearranging terms, one obtains 

AL 



l s < 



(a + 2) 2 ' 



(3) 



(4) 



If a = 0, the subgraph S is a disconnected part of the 
network and is a module if l s < L which is always true. 
If a is strictly positive, Eq. Q sets an upper limit to 
the number of internal links that S must have in order 
to be a module. This is a little odd, because it means 
that the definition of community implied by modular- 
ity depends on the size of the whole network, instead of 
involving a "local" comparison between the number of 



internal and external links of the module. For a < 2 one 
has 21 s > l° ut , which means that the total degree inter- 
nal to the subgraph is larger than its external degree, 
i.e. d\ n > d° s ut . The attributes "internal" and "external" 
here mean that the degree is calculated considering only 
the internal or the external links, respectively. In this 
case, the subgraph S would be a community according 
to the "weak" definition given by Radicchi et al. |25j . 

For a < 2 the right-hand-side of inequality Q is in 
the interval [L/4,L]. A subgraph of size l s would then 
be a community both within the modularity framework 
and according to the weak definition of Radicchi et al. 
if a < 2 and l s is less than a quantity in the interval 
[L/4, L\. Sufficient conditions for which these constraints 
are always met are then 



l s < j,a< 2. 



(5) 



In Section llVl we shall only consider modules of this kind. 

According to Eq. ©, a partition of a network into 
actual modules would have a positive modularity, as all 
summands in Eq. (Q are positive. On the other hand, 
for particular partitions, one could bump into values of Q 
which are negative. The network itself, meant as a par- 
tition with a single module, has modularity zero: in this 
case, in fact, l\ = L, d\ = 2L, and the only two terms 
of the unique module in Q cancel each other. Usually, a 
value of Q larger than 0.3 — 0.4 is a clear indication that 
the subgraphs of the corresponding partition are mod- 
ules. However, the maximal modularity differs from a 
network to another and depends on the number of links 
of the network. In the next section we shall derive the 
expression of the maximal possible value Qm(L) that Q 
can attain on a network with L links. We will prove that 
the upper limit for the value of modularity for any net- 
work is 1 and we will see why the modularity is not scale 
independent. 



III. THE MOST MODULAR NETWORK 

In this section we discuss of the most modular net- 
work which will introduce naturally the problem of scales 
in modularity optimization. In Ref. 2], the authors 
consider the interesting example of a network made of 
m identical complete graphs (or 'cliques'), disjoint from 
each other. In this case, the modularity is maximal for 
the partition of the network in the cliques and is given 
by the sum of m equal terms. In each clique there are 
I = L/m links, and the total degree is d = 21, as there 
are no links connecting nodes of the clique to the other 
cliques. We thus obtain 



Q = rn 



\l_ 
L 



2l_ 
2L 



m 



1 



rn 



1 



1 

m' 



(6) 



which converges to 1 when the number of cliques goes to 
infinity. We remark that for this result to hold it is not 
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FIG. 1: Design of a connected network with maximal modu- 
larity. The modules (circles) must be connected to each other 
by the minimal number of links. 



necessary that the m connected components be cliques. 
The number of nodes of the network and within the mod- 
ules does not affect modularity. If we have m modules, 
we just need to have L/m links inside the modules, as 
long as this is compatible with topological constraints, 
like connectedness. In this way, a network composed by 
m identical trees (in graph theory, a forest) has the same 
maximal modularity reported in Eq. although it has 
a far smaller number of links as compared with the case 
of the densely connected cliques (for a given number of 
nodes). 

A further interesting question is how to design a con- 
nected network with N nodes and L links which maxi- 
mizes modularity. To address this issue, we proceed in 
two steps: first, we consider the maximal value Qm(™>, L) 
for a partition into a fixed number m of modules; af- 
ter that, we look for the number m* that maximizes 
Q M (m,L). 

Let us first consider a partition into m modules. Ide- 
ally, to maximize the contribution to modularity of each 
module, we should reduce as much as possible the number 
of links connecting modules. If we want to keep the net- 
work connected, the smallest number of inter-community 
links must be m — 1. For the sake of clarity, and to sim- 
plify the mathematical expressions (without affecting the 
final result), we assume instead that there are m links 
between the modules, so that we can arrange the latter 
in the simple ring-like configuration illustrated in Fig. 
The modularity of such a network is 



2L 



where 



2L 



L — m. 



(7) 
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It is easy to see that the expression of Eq. reaches its 
maximum when all modules contain the same number 



of links, i.e. l s = I = L/m 
maximum is then given by 



l,Vs = l,2,...,m. The 
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We have now to find the maximum of Qm(^^L) when 
the number of modules m is variable. For this purpose 
we treat m as a real variable and take the derivative of 
Qm{^^L) with respect to m 



dQ M (m, L) 
dm 



1 
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which vanishes when m = m* = \[L. This point in- 
deed corresponds to the absolute maximum Qm(L) of 
the function Qm(™>,L). This result coincides with the 
one found by the authors of [22| for a one-dimensional 
lattice, but our proof is completely general and does not 
require preliminary assumptions on the type of network 
and modules. 

Since m is not a real number, the actual maximum is 
reached when m equals one of the two integers closest to 
ra*, but that is not important for our purpose, so from 
now on we shall stick to the real- valued expressions, their 
meaning being clear. The maximal modularity is then 



Qm(L) = QmK, L) = 1 — 
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(11) 



and approaches 1 if the total number of links L goes to in- 
finity. The corresponding number of links in each module 
is I = \/~L — 1. The fact that all modules have the same 
number of links does not imply that they have as well the 
same number of nodes. Again, modularity does not de- 
pend on the distribution of the nodes among the modules 
as long as the topological constraints are satisfied. For 
instance, if we assume that the modules are connected 
graphs, there must be at most n = I + 1 = \[L nodes in 
each module. The crucial point here is that modularity 
seems to have some intrinsic scale of order y/~L, which 
constrains the number and the size of the modules. For 
a given total number of nodes and links we could build 
many more than \fL modules, but the corresponding net- 
work would be less "modular" , namely with a value of the 
modularity lower than the maximum of Eq. JTTJ. This 
fact is the basic reason why small modules may not be 
resolved through modularity optimization, as it will be 
clear in the next section. 



IV. THE RESOLUTION LIMIT 

We analyze a network with L links and with at least 
three modules, in the sense of the definition of formula JSJ 
(Fig. We focus on a pair of modules, Mi and .M2, and 
distinguish three types of links: those internal to each of 
the two communities (li and Z2? respectively), between 
Mi and M2 (knt) and between the two communities 
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FIG. 2: Scheme of a network partition into three or more 
modules. The two circles on the left picture two modules, 
the oval on the right reprensents the rest of the network Mo, 
whose structure is arbitrary. 



and the rest of the network Mo (l° ut and I™ 1 )- I n order 
to simplify the calculations we express the numbers of 
external links in terms of l\ and Z2, so hnt = aih = Q> 2 1 2 , 
ll ut = bih and l$ ut = b 2 l 2: with a 1 ,a 2 ,b 1 ,b 2 > 0. Since 
Mi and M 2 are modules by construction, we also have 
a>i + bi < 2, a 2 + b 2 < 2 and l\,l 2 < L/4 (see Sec- 
tion^). Now we consider two partitions A and B of the 
network. In partition A, Mi and M 2 are taken as sep- 
arate modules, and in partition B they are considered 
as a single community. The split of the rest of the net- 
work is arbitrary but identical in both partitions. We 
want to compare the modularity values Qa and Qb of 
the two partitions. Since the modularity is a sum over 
the modules, the contribution of Mo is the same in both 
partitions and is denoted by Qo- From Eq. JQ we obtain 



Qa = 
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(a 2 + b 2 + 2)l 2 



(ai + bi + 2)/i 
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We see that if ai = a 2 = 0, i.e. if there are no links 
between „Mi and A^2, the above condition is trivially 
satisfied. Instead, if the two modules are connected to 
each other, something interesting happens. Each of the 
coefficients ai, a 2 , 61, 62 cannot exceed 2 and Zi and l 2 are 
both smaller than L/4 by construction but can be taken 
as small as we wish with respect to L. In this way, it is 
possible to choose li and l 2 such that the inequality of 
Eq. (|15[) is not satisfied. In such a situation we can have 
AQ > and the modularity of the configuration where 
the two modules are considered as a single community 
is larger than the partition where the two modules are 
clearly identified. This implies that by looking for the 
maximal modularity, there is the risk to miss important 
structures at smaller scales. To give an idea of the size 
of li and l 2 at which modularity optimization could fail, 
we consider for simplicity the case in which Mi and M 2 
have the same number of links, i.e. li = l 2 = I. The 
condition on I for the modularity to miss the two modules 
also depends on the fuzziness of the modules, as expressed 
by the values of the parameters ai, a 2l 61, b 2 . In order 
to find the range of potentially "dangerous" values of Z, 
we consider the two extreme cases in which 

• the two modules have a perfect balance between 
internal and external degree (ai +&i =2, a 2 + b 2 = 
2), so they are on the edge between being or not 
being communities, in the weak sense denned in 

• the two modules have the smallest possible external 
degree, which means that there is a single link con- 
necting them to the rest of the network and only 
one link connecting each other (ai = a 2 = bi = 

62 = i/0- 

In the first case, the maximum value that the coefficient 
of L can take in Eq. (|15|) is 1/4, when a\ = a 2 = 2 and 
bi « 0, b 2 « 0, so we obtain that Eq. ([T5]) may not be 
satisfied for 



i < n 



L 



(16) 



Qb = 



h + h + Q>ih 



(2ai + bi + 2)Zi + (b 2 + 2)Z 2 



2L 



The difference AQ = Qb — Qa is 
AQ = [2Laih - (ai + h + 2)(a 2 



1 2 



(13) 



b 2 + 2)hh] /(2L 2 ). 

(14) 

As A^i and M 2 are both modules by construction, we 
would expect that the modularity should be larger for 
the partition where the two modules are separated, i.e. 
Qa > Qb, which in turn implies AQ < 0. From Eq. (|14|) 
we see that AQ is negative if 



h > 



2La x 



(a 1 + b 1 + 2){a 2 +b 2 + 2)' 



(15) 



which is a scale of the order of the size of the whole net- 
work. In this way, even a pair of large communities may 
not be resolved if they share enough links with the nodes 
outside them (in this case we speak of "fuzzy" communi- 
ties). A more striking result emerges when we consider 
the other limit, i.e. when ci\ = a 2 = b\ = b 2 = l/l. In 
this case it is easy to check that Eq. (|T5)> is not satis- 
fied for values of the number of links inside the modules 
satisfying 



(17) 



If we now assume that we have two (interconnected) mod- 
ules with the same number of internal links I < I™ 71 < 
j^max ^ ^ie discussion above implies that the modules can- 
not be resolved through modularity optimization, not 
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FIG. 3: (A) A network made out of identical cliques (which 
are here complete graphs with m nodes) connected by sin- 
gle links. If the number of cliques is larger than about y/L, 
modularity optimization would lead to a partition where the 
cliques are combined into groups of two or more (represented 
by a dotted line). (B) A network with four pairwise iden- 
tical cliques (complete graphs with m and p < m nodes, 
respectively); if m is large enough with respect to p (e.g. 
m = 20, p = 5), modularity optimization merges the two 
smallest modules into one (shown with a dotted line). 



even if they were complete graphs connected by a sin- 
gle link. As we have seen from Eq. JTSJ), it is possible 
to miss modules of larger size, if they share more links 
with the rest of the network (and with each other). For 



min,max 



l\ 7^ h the conclusion is similar but the scales I 
are modified by simple factors. 



V. CONSEQUENCES 

We begin with a very schematic example, for illustra- 
tive purposes. In Fig. EH A) we show a network consist- 
ing of a ring of cliques, connected through single links. 
Each clique is a complete graph K m with m nodes and 
has m(m — l)/2 links. If we assume that there are n 
cliques, with n even, the network has a total of N = nm 
nodes and L = nm(m — l)/2 + n links. The network has 
a clear modular structure where the communities corre- 
spond to single cliques and we expect that any detection 
algorithm should be able to detect these communities. 
The modularity Qsingie of this natural partition can be 
easily calculated and equals 



Q single 



1 



m(m — 1) + 2 



1 

n 



(18) 



On the other hand, the modularity Q pa irs of the parti- 
tion in which pairs of consecutive cliques are considered 
as single communities (as shown by the dotted lines in 



Fig. HA)) is 



Qp 



= l 



l 



(19) 



m(m — 1) + 2 n 

The condition Qsingie > Qpairs is satisfied if and only if 

ra(ra - 1) + 2 > n. (20) 

In this example, m and n are independent variables and 
we can choose them such that the inequality of formula 
([2Uj) is not satistied. For instance, for m = 5 and n = 30, 
Qsingie = 0.876 and Q pa irs = 0.888 > Qsingie- An effi- 
cient algorithm looking for the maximum of the modular- 
ity would find the configuration with pairs of cliques and 
not the actual modules. The difference Q pa irs — Qsingie 
would be even larger if n increases, for m fixed. 

The example we considered was particularly simple 
and hardly represents situations found in real networks. 
However, the initial configuration that we considered in 
the previous section (Fig.EJ) is absolutely general, and the 
results make us free to design arbitrarily many networks 
with obvious community structures in which modularity 
optimization does not recognize (some of) the real mod- 
ules. Another example is shown in Fig.[3fB). The circles 
represent again cliques, i.e. complete graphs: the two on 
the left have m nodes each, the other two p < m nodes. 
If we take rn = 20 and p = 5, the maximal modularity 
of the network corresponds to the partition in which the 
two smaller cliques are merged [as shown by the dotted 
line in Fig.|3jB)]. This trend of the optimal modularity to 
group small modules has already been remarked in |2q . 
but as a result of empirical studies on special networks, 
without any complete explanation. 

In general, we cannot make any definite statement 
about modules found through modularity optimization 
without a method which verifies whether the modules 
are indeed single communities or a combination of com- 
munities. It is then necessary to inspect the structure of 
each of the modules found. As an example, we take the 
network of Fig. [3f A), with n = 30 identical cliques, where 
each clique is a K m with m = 5. As already said above, 
modularity optimization would find modules which are 
pairs of connected cliques. By inspecting each of the 
modules of the 'first generation' (by optimizing modu- 
larity, for example), we would ultimately find that each 
module is actually a set of two cliques. 

We thus have seen that modules identified through 
modularity optimization may actually be combinations 
of smaller modules. During the process of modularity 
optimization, it is favorable to merge connected modules 
if they are sufficiently small. 

We have seen in the previous Section that any two 
interconnected modules, fuzzy or not, are merged if the 
number of links inside each of them does not exceed /^ m . 
This means that the largest structure one can form by 
merging a pair of modules of any type (including cliques) 
has at least 2l™ n internal links. By reversing the argu- 
ment, we conclude that if modularity optimization finds 
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a module S with Is internal links, it may be that the lat- 
ter is a combination of two or more smaller communities 
if 

Is < 2ln in = V2L. (21) 

This example is an extreme case, in which the internal 
partition of S can be arbitrary, as long as the pieces are 
modules in the sense discussed in Section [O] Under the 
condition (|2T)). the module could in principle be a cluster 
of loosely interconnected complete graphs. 

On the other hand, the upper limit of Is can be much 
larger than y/2L, if the substructures are on average more 
interconnected with each other, as we have seen in Sec- 
tion IIVI In fact, fuzzy modules can be combined with 
each other even if they contain many more than l™ 71 
links. The more interconnected the modules, the larger 
will be the resulting supermodule. In the extreme case 
in which all submodules are very fuzzy, the size Is of 
the supermodule could be in principle as large as that of 
the whole network, i.e. Is < L. This result comes from 
the extreme case where the network is split in two very 
fuzzy communities, with L/4 internal links each and L/2 
between them. By virtue of Eq. (|T6|) . it is favorable (or 
just as good) to merge the two modules and the resulting 
structure is the whole network. This limit Is < L is of 
course always satisfied but suggests here that it is im- 
portant to carefully analyze all modules found through 
modularity optimization, regardless of their size. 

The probability that a very large module conceals sub- 
structures is however small, because that could only hap- 
pen if all hidden submodules are very fuzzy communities, 
which is unlikely. Instead, modules with a size Is ~ V2L 
or smaller can result from an arbitrary merge of smaller 
structures, which may go from loosely interconnected 
cliques to very fuzzy communities. Modularity optimiza- 
tion is most likely to fail in these cases. 

In order to illustrate this theoretical discussion, we an- 
alyze five examples of real networks: 

1. the transcriptional regulation network of Saccha- 
romyces cerevisiae (yeast); 

2. the transcriptional regulation network of Es- 
cherichia coli; 

3. a network of electronic circuits; 

4. a social network; 

5. the neural network of Caenorhabditis Elegans. 

We downloaded the lists of edges of the first four net- 
works from Uri Alon's Website |2flj . whereas the last one 
was downloaded from the WebSite of the Collective Dy- 
namics Group at Columbia University 30]. 

In the transcriptional regulation networks, nodes rep- 
resent operons, i.e. groups of genes that are transcribed 
on to the same mRNA and an edge is set between two 
nodes A and B if A activates B. These systems have 



been previously studied to identify motifs in complex net- 
works [28|. There are 688 nodes, 1,079 links for yeast, 
423 nodes and 519 links for E. coli. Electronic circuits 
can be viewed as networks in which vertices are electronic 
components (like capacitors, diodes, etc.) and connec- 
tions are wires. Our network maps one of the benchmark 
circuits of the so-called ISCAS'89 set; it has 512 nodes, 
819 links. In the social network we considered, nodes are 
people of a group and links represent positive sentiments 
directed from one person to another, based on question- 
naires: it has 67 nodes and 182 links. Finally, the neural 
network of C. elegans is made of 306 nodes (neurons), 
connected through 2, 345 links (synapsis, gap junctions). 
We remark that most of these networks are directed, here 
we considered them as undirected. 

First, we look for the modularity maximum by using 
simulated annealing. We adopt the same recipe intro- 
duced in Ref. 13], which makes the optimization proce- 
dure very effective. There are two types of moves to pass 
from a network partition to the next: individual moves, 
where a single node is passed from a community to an- 
other, and collective moves, where a pair of communities 
is merged into a single one or, vice versa, a community 
is split into two parts. Each iteration at the same tem- 
perature consists of a succession of A^ 2 individual and N 
collective moves, where N is the total number of nodes 
of the network. The initial temperature T and the tem- 
perature reduction factor / are arbitrarily tuned to find 
the highest possible modularity: in most cases we took 
T ~ 1 and / between 0.9 and 0.99. 

We found that all networks are characterized by high 
modularity peaks, with Q ma x ranging from 0.4022 (C. 
elegans) to 0.7519 (E. coli). The corresponding opti- 
mal partitions consist of 9 (yeast), 27 (E. coli), 11 (elec- 
tronic), 10 (social) and 4 (C. elegans) modules (for E. 
coli our results differ but are not inconsistent with those 
obtained in |l3| for a slighly different database; these dif- 
ferences however do not affect our conclusions). In order 
to check if the communities have a substructure, we used 
again modularity optimization, by constraining it to each 
of the modules found. In all cases, we found that most 
modules displayed themselves a clear community struc- 
ture, with very high values of Q. The total number of 
submodules is 57 (yeast), 76 (E. coli), 70 (electronic), 
21 (social) and 15 (C. elegans), and is far larger than 
the corresponding number at the modularity peaks. The 
analysis of course is necessarily biased by the fact that 
we neglect all links between the original communities, 
and it may be that the submodules we found are not 
real modules for the original network. In order to verify 
that, we need to check whether the condition of Eq. (0) 
is satisfied or not for each submodule and we found that 
it is the case. A further inspection of the communities 
found through modularity optimization thus reveals that 
they are, in fact, clusters of smaller modules. The mod- 
ularity values corresponding to the partitions of the net- 
works in the submodules are clearly smaller than the peak 
modularities that we originally found through simulated 
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annealing (see Table By restricting modularity op- 
timization to a module we have no guarantee that we 
accurately detect its substructure and that this is a safe 
way to proceed. Nevertheless, we have verified that all 
substructures we detected are indeed modules, so our re- 
sults show that the search for the modularity optimum 
is not equivalent to the detection of communities defined 
through Eq. J2J). 



network 


# modules (Q max ) 


total # of modules (Q) 


Yeast 
E. Coli 
Elect r. circuit 
Social 
C. elegans 


9 (0.7396) 
27 (0.7519) 
11 (0.6701) 

10 (0.6079) 
4 (0.4022) 


57 (0.6770) 
76 (0.6615) 
70 (0.6401) 
21 (0.5316) 
15 (0.3613) 



TABLE I: Results of the modularity analysis on real net- 
works. In the second column, we report the number of mod- 
ules detected in the partition obtained for the maximal modu- 
larity. These modules however contain submodules and in the 
third column we report the total number of submodules we 
found and the corresponding value of the modularity of the 
partition, which is lower than the peak modularity initially 
found. 

The networks we have examined are fairly small but 
the problem we pointed out can only get worse if we 
increase the network size, especially when small commu- 
nities coexist with large ones and the module size distri- 
bution is broad, which happens in many cases [2(1 127^. 
As an example, we take the recommendation network of 
the online seller Amazon.com. While buying a product, 
Amazon recommends items which have been purchased 
by people who bought the same product. In this way it 
is possible to build a network in which the nodes are the 
items (books, music), and there is an edge between two 
items A and B if B was frequently purchased by buy- 
ers of A. Such a network was examined in Ref. |2£| and 
is very large, with 409, 687 nodes and 2, 464, 630 edges. 
The authors analyzed the community structure by greedy 
modularity optimization which is not necessarily accu- 
rate but represents the only strategy currently available 
for large networks. They identified 1,684 communities 
whose size distribution is well approximated by a power 
law with exponent 2. From the size distribution, we esti- 
mated that over 95% of the modules have sizes below the 
limit of Eq. (|21[) . which implies that basically all modules 
need to be further investigated. 



found that the definition of community implied by mod- 
ularity is actually not consistent with its optimization 
which may favour network partitions with groups of mod- 
ules combined into larger communities. We could say 
that, by enforcing modularity optimization, the possible 
partitions of the system are explored at a coarse level, 
so that modules smaller than some scale may not be re- 
solved. The resolution limit of modularity does not rely 
on particular network structures, but only on the com- 
parison between the sizes of interconnected communities 
and that of the whole network, where the sizes are mea- 
sured by the number of links. 

The origin of the resolution scale lies in the fact that 
modularity is a sum of terms, where each term corre- 
sponds to a module. Finding the maximal modularity is 
then equivalent to look for the ideal tradeoff between the 
number of terms in the sum, i.e. the number of mod- 
ules, and the value of each term. An increase of the 
number of modules does not necessarily correspond to 
an increase in modularity because the modules would be 
smaller and so would be each term of the sum. This is 
why for some characteristic number of terms the modu- 
larity has a peak. The problem is that this "optimal" 
partition, imposed by mathematics, is not necessarily 
correlated with the actual community structure of the 
network, where communities may be very heterogeneous 
in size, especially if the network is large. 

Our result implies that modularity optimization might 
miss important substructures of a network, as we have 
confirmed in real world examples. From our discussion 
we deduce that it is not possible to exclude that modules 
of virtually any size may be clusters of modules, although 
the problem is most likely to occur for modules with a 
number of internal links of the order of V2L or smaller. 
For this reason, it is crucial to check the structure of 
all detected modules, for instance by constraining mod- 
ularity optimization on each single module, a procedure 
which is not safe but may give useful indications. 

The fact that quality functions such as the modularity 
have an intrinsic resolution limit calls for a new theo- 
retical framework which focuses on a local definition of 
community, regardless of its size. Quality functions are 
still helpful, but their role should be probably limited to 
the comparison of partitions with the same number of 
modules. 



VI. CONCLUSIONS 

In this article we have analyzed in detail modularity 
and its applicability to community detection. We have 
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